true

Load results

load("./processed_data_files/what_we_find_VS_ELM_clust20171019.RData")
rm(list = ls()[!ls() %in% c("printTable","XYZ.p.adjust","res_count", "sequential_filter")])
doman_viral_pairs = T # if false - add human proteins containing domains
motifs = T # based on Vidal's data

Empirical p-value for seing a domain a number of times

What is the chance of randomly seeing any domain the observed number of times among all proteins that interact with a specific viral protein

Only significant domains with significant motifs

printTable(res_count, doman_viral_pairs = doman_viral_pairs, motifs = motifs, destfile = "./results/domains_fdr_corrected_only_with_significant_motifs_empirical_p_value.tsv", only_with_motifs = T, fdr_motifs = 0.05)

Only significant domains with motifs

printTable(res_count, doman_viral_pairs = doman_viral_pairs, motifs = motifs, destfile = "./results/domains_fdr_corrected_only_with_motifs_empirical_p_value.tsv", only_with_motifs = T)

2-step filtering, ranking by Fisher test p-value

printTable(sequential_filter, doman_viral_pairs = doman_viral_pairs, motifs = motifs, destfile = "./results/domains_fdr_corrected_only_with_motifs_sequential_filter.tsv", only_with_motifs = T)
plot(sequential_filter, IDs_interactor_viral + IDs_domain_human ~ p.value, xlab = "Fisher's Exact Test pvalue", breaks = seq(-0.01, 
                                                                                                                             1.01, 0.01))

PermutResult2D(res = sequential_filter, N = 500, value.cols = c("p.value", "Emp.p.value")) +
    ggtitle("2D-bin plots of 250 top-scoring viral protein - human domain pairs, \n statistic: count of a domain among interacting partners of a viral protein")
## Warning in plyr::split_indices(scale_id, n): '.Random.seed' is not an
## integer vector but of type 'NULL', so ignored
## Warning: Removed 242 rows containing non-finite values (stat_density).
## Warning in (function (data, mapping, alignPercent = 0.6, method =
## "pearson", : Removed 242 rows containing missing values
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 278 rows containing non-finite values (stat_bin2d).
## Warning: Removed 242 rows containing non-finite values (stat_density).

How all these methods perform at finding ELM domains?

The absolute numbers

compared the total number of domains found / the total in ELM

The enrichment in ELM domains over the background

P-value for the enrichment in ELM domains over the background